Romanian Valence Dictionary in XML Format

نویسندگان

  • Ana-Maria Barbu
  • Emil Ionescu
  • Verginica Barbu Mititelu
چکیده

Valence dictionaries are dictionaries in which logical predicates (most of the times verbs) are inventoried alongside with the semantic and syntactic information regarding the role of the arguments with which they combine, as well as the syntactic restrictions these arguments have to obey. In this article we present the incipient stage of the project “Syntactic and semantic database in XML format: an HPSG representation of verb valences in Romanian”. Its aim is the development of a valence dictionary in XML format for a set of 3000 Romanian verbs. Valences are specified for each sense of each verb, alongside with an illustrative example, possible argument alternations and a set of multiword expressions in which the respective verb occurs with the respective sense. The grammatical formalism we make use of is Head-driven Phrase Structure Grammar, which offers one of the most comprehensive frames of encoding various types of linguistic information for lexical items. XML is the most appropriate mark-up language for describing information structured in HPSG framework. The project can be further on extended so that to cover all Romanian verbs (around 7000) and also other predicates (nouns, adjectives, prepositions).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Romanian Lexical Data Bases: Inflected and Syllabic Forms Dictionaries

This paper presents two lexical data bases for Romanian: RoMorphoDict, a dictionary of inflected forms and RoSyllabiDict, a dictionary of syllabified inflected forms. Each data basis is available in two Unicode formats: text and XML. An entry of RoMorphoDict, in text format, contains information on inflected form, its lemma, its morpho-syntactic description and the marking of the stressed vowel...

متن کامل

100K+ words, machine-readable, pronunciation dictionary for the Romanian language

This paper intends to present a newly developed Romanian language pronunciation dictionary called NaviRo. The dictionary contains more than 100k words from the DexOnline dictionary together with their phonetic transcriptions in Speech Assessment Method Phonetic Alphabet (SAMPA), a machine readable alphabet. The development of the pronunciation dictionary and the system architecture are also des...

متن کامل

An Optimal and Portable Parsing Method for Romanian, French, and German Large Dictionaries

This paper presents a cross-linguistic analysis of the largest dictionaries currently existing for Romanian, French, and German, and a new, robust and portable method for Dictionary Entry Parsing (DEP), based on SegmentationCohesion-Dependency (SCD) configurations. The SCD configurations are applied successively on each dictionary entry to identify its lexicographic segments (the first SCD conf...

متن کامل

Building a Generative Lexicon for Romanian

We present in this paper an on-going research: the construction and annotation of a Romanian Generative Lexicon (RoGL). Our system follows the specifications of CLIPS project for Italian language. It contains a corpus, a type ontology, a graphical interface and a database from which we generate data in XML format.

متن کامل

Tvärslå – defining an XML exchange format and then building an on-line Nordic dictionary

Tvärslå is a dynamically expandable multilingual on-line dictionary, composed of all dictionaries used and developed in the Nordisk netordbog (Nordic Web Dictionary) project. Currently the languages included are Swedish, Danish, Norwegian, Icelandic, Finnish and English. Tvärslå can be used both interactively and called by the Tvärsök system [1]. This article describes the functionality of Tvär...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006